The ATCOSIM Corpus of Non-Prompted Clean Air Traffic Control Speech

نویسندگان

  • Konrad Hofbauer
  • Stefan Petrik
  • Horst Hering
چکیده

Air traffic control (ATC) is based on voice communication between pilots and controllers and uses a highly task and domain specific language. Due to this very reason, spoken language technologies for ATC require domain-specific corpora, of which only few exist to this day. The ATCOSIM Air Traffic Control Simulation Speech corpus is a speech database of non-prompted and clean ATC operator speech. It consists of ten hours of speech data, which were recorded in typical ATC control room conditions during ATC real-time simulations. The database includes orthographic transcriptions and additional information on speakers and recording sessions. The ATCOSIM corpus is publicly available and provided online free of charge. In this paper, we first give an overview of ATC related corpora and their shortcomings. We then show the difficulties in obtaining operational ATC speech recordings and propose the use of existing ATC real-time simulations. We describe the recording, transcription, production and validation process of the ATCOSIM corpus, and outline an application example for automatic speech recognition in the ATC domain.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ATCOSIM - Air Traffic Control Simulation Speech Corpus Validation Report

The ATCOSIM speech corpus provided by Eurocontrol Experimental Centre has been validated against a list of specified checks. The validation covered completeness, formal checks and manual checks of randomly selected samples. The overall quality of the corpus is good and there should be no problem in using the corpus for speech applications. Minor improvements to the documentation have been propo...

متن کامل

The Effect of Colligational Corpus-based Instruction on Enhancing the Pragmalinguistic Knowledge of Request Speech Act among Iranian Intermediate EFL Learners

This study investigated the effectiveness of colligational corpus-based instruction on enhancing the pragmalinguistic knowledge of speech act of request among Iranian intermediate EFL learners. The objective of the study was to find out whether or not providing students with corpora through using colligational instruction had any significant effects on enhancing their pragmalinguistic knowledge...

متن کامل

The Effect of Colligational Corpus-based Instruction on Enhancing the Pragmalinguistic Knowledge of Request Speech Act among Iranian Intermediate EFL Learners

This study investigated the effectiveness of colligational corpus-based instruction on enhancing the pragmalinguistic knowledge of speech act of request among Iranian intermediate EFL learners. The objective of the study was to find out whether or not providing students with corpora through using colligational instruction had any significant effects on enhancing their pragmalinguistic knowledge...

متن کامل

Dialogues in Air Traffic Control

We have taken an off-the-shelf, commercial continuous speech recogniser and conducted evaluations for the domain of Air Traffic Control. The language of this domain proved to be quite unrestricted, contrary to our initial intuitions. Our experiments show that constraints typically used by speech recognisers do not provide accurate enough results and need to be augmented with other knowledge sou...

متن کامل

Comparative Analysis of Measured PM Concentrations with AQI and Iranian Clean Air Standard (Case Study: Sari, Iran) 2010-2015

Background & Aims of the Study: Among different kinds of air pollutants, Particle matters due to their proved destructive impacts on human health and their various sources like transportation attract wide attention. The objective of this work is determining the real value of particle matters in one of the important and busy roads in Iran (Sari- Qaemshahr ro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008